To convert between DNA, RNA and protein sequences, the DNA and RNA molecules provide the methods

  • get_dna()
  • get_rna()
  • get_protein().

These methods use the option kwarg to define which sequence is to be converted:

  • option='coding', use sequence directly read from strand (default)
  • option='complementary', use the complement of the coding sequence
  • option='reverse_complementary', use the reverse complement of the coding sequence

Below we show an example of get_dna() method applied in three different modes to a DNA sequence molecule.


In [1]:
from wc_rules.bioseq import DNA, RNA, Protein
inputstr = 'TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA'

dna1 = DNA(ambiguous=False).set_sequence(inputstr)
print('1: '+ dna1.get_sequence(as_string=True))
print('2: '+ dna1.get_dna(option='coding',as_string=True))
print('3: '+ dna1.get_dna(option='complementary',as_string=True))
print('4: '+ dna1.get_dna(option='reverse_complementary',as_string=True))


1: TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA
2: TTGTTATCGTTACCGGGAGTGAGGCGTCCGCGTCCCTTTCAGGTCAAGCGACTGAAAAACCTTGCAGTTGATTTTAAAGCGTATAGAAGACAATACAGA
3: AACAATAGCAATGGCCCTCACTCCGCAGGCGCAGGGAAAGTCCAGTTCGCTGACTTTTTGGAACGTCAACTAAAATTTCGCATATCTTCTGTTATGTCT
4: TCTGTATTGTCTTCTATACGCTTTAAAATCAACTGCAAGGTTTTTCAGTCGCTTGACCTGAAAGGGACGCGGACGCCTCACTCCCGGTAACGATAACAA

Similar to get_sequence(), the methods get_dna(), get_rna() and get_protein() can operate on subsequences defined by (start,end) or (start,length).

For example, to get the reverse-complementary RNA coded in the first 66 bases of dna1 and instantiate a new RNA molecule, do


In [2]:
seq = dna1.get_rna(option='reverse_complementary',start=0,length=66,as_string=True)
rna1 = RNA().set_sequence(seq)
print(rna1.get_sequence())


UGCAAGGUUUUUCAGUCGCUUGACCUGAAAGGGACGCGGACGCCUCACUCCCGGUAACGAUAACAA

To get the protein sequence coded in the first 66 bases of dna1 and instantiate a new Protein molecule, do


In [3]:
seq = dna1.get_protein(option='coding',start=0,length=66,as_string=True)
prot1 = Protein().set_sequence(seq)
print(prot1.get_sequence())


LLSLPGVRRPRPFQVKRLKNLA